Categories

Versions

You are viewing the RapidMiner Studio documentation for version 9.10 - Check here for latest version

Split Document into Collection (Operator Toolbox)

Synopsis

This operator splits a document (for example from Read Document) into a collection of documents, according to the split string parameter.

Description

This operator receives a document at its input port and splits it into a collection of documents, according to the split string parameter. The input document can originate for example from a Read Document operator, which reads in a complete text file and provides the content of the file as one document. You can use the Split Document into Collection operator to split this document into a collection and process it one by one. For example if you want to process a file line by line, you can use the end of line character (''\n'') as the split string.

The split documents are also converted into an ExampleSet with one attribute, containing the documents.

Input

  • document

    The input document.

Output

  • collection (Collection)

    The resulting collection of documents.

  • example set (Data Table)

    An ExampleSet containing the split documents as an attribute. Each document is one example.

Parameters

  • split_string String on which the input document is split. The split string is not included in the resulting documents. Range:

Tutorial Processes

Use Split Document into Collection to process a document line by line

This tutorial process illustrate how to use the Split Document into Collection operator to process a larger document line by line. A Create Document operator is used to create an example document, containing multiple lines of data. The Split Document into Collection operator is used to split the input document into a collection with one document per line. The JSON to Data operator converts this collection into an ExampleSet.